pacman::p_load(olsrr, ggstatsplot, sf, tmap, tidyverse, performance, see, sfdep, GWmodel, lubridate)Take-home Ex03
Overview and Objectives
In this take-home my aim is to prototype and evaluate the visualisation of pages and components in our Shiny application.
My responsibilities will be involving the Geographically Weighted Regression (GWR) on the districts of Malaysia. The GWR would focus on modelling crime rates of each district with household income, income inequality and poverty.
Data
Income Inequality Data: Household income inequality by district (https://data.gov.my/data-catalogue/hh_inequality_district)
Income Data: Mean and median gross monthly by district (https://data.gov.my/data-catalogue/hh_income_district)
Poverty Data: Poverty rates by district (https://data.gov.my/data-catalogue/hh_poverty_district)
Crime Data: Crime rates by district (https://data.gov.my/data-catalogue/crime_district)
Malaysia - Subnational Administrative Boundaries: (https://data.humdata.org/dataset/cod-ab-mys?)
Packages
olsrr: Provides tools for building and validating OLS regression models with stepwise selection and diagnostics.
ggstatsplot: Extends ggplot2 with statistical tests and data visualization in a single, user-friendly syntax.
sf: Provides a standardized way to work with spatial vector data (points, lines, polygons).
tmap: Creates thematic maps with interactive and static modes for spatial visualization.
tidyverse: A collection of packages for easy data manipulation, visualization, and analysis.
performance: Offers tools to assess, validate, and compare regression models.
see: Supplies visual themes and color palettes to enhance ggplot2 visualizations.
sfdep: Provides spatial dependency analysis tools specifically for sf objects.
GWmodel: Implements geographically weighted regression (GWR) and spatial analysis methods.
lubridate: Simplifies the handling and manipulation of dates and times.
UI Design
Motivation
Using Geographically Weighted Regression (GWR) to explore the relationship between crime rates and socioeconomic factors like household income, income inequality, and poverty allows us to understand how these factors influence crime spatially rather than uniformly across all regions. This approach is motivated by the premise that socio-economic conditions impact crime differently depending on local contexts—certain areas with high poverty may experience more crime due to limited economic opportunities, while other regions might be more influenced by income inequality. By applying GWR, we can identify these spatial variations and better understand where and how each factor contributes to crime. This spatial insight not only refines our understanding of the socio-economic drivers of crime but also supports the development of targeted interventions, enabling policymakers to address crime in a more focused and locally relevant manner.
A Shiny app could visualize the spatial relationship between crime rates and socioeconomic factors like household income, income inequality, and poverty by displaying interactive, region-specific maps. Users could toggle between map layers to see how each factor correlates with crime across areas, revealing patterns that vary locally. Adjustable settings, such as bandwidth, would allow users to explore the data at different spatial scales, providing insights into whether socioeconomic conditions affect crime more strongly in specific regions. Real-time plots and statistics that update based on user inputs would further enhance understanding, making complex spatial relationships clear and actionable for policymakers and communities aiming to develop targeted crime prevention strategies.
Overall Layout

For the shiny application, three main components will be used for our pages: headerPanel, sidebarPanel, and mainPanel.
Header Panel : This is the topmost part of the UI where we can put a description of the application or have a navbar where you can navigate different pages. Each page leads to other group members work/part in this project
Sidebar Panel: This panel would mainly consist of the input controls that the user can play around with to change the map output in the Main Panel.
Main Panel : This is the primary area of the application and it typically contains outputs. The main panel displays the output (like maps, plots, tables, etc.) based on the input given in the sidebar panel. There would be tabsets that allows users to switch between views.
The various details of the UI will be explored as we try to come up with the data inputs below.
Data
Aspatial
read_csv() of sf package will be used to import the aspatial data into R environment.
crime <- read_csv("data/aspatial/crime_district.csv")Rows: 19152 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): state, district, category, type
dbl (1): crimes
date (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
inequality <- read_csv("data/aspatial/inequality_district.csv")Rows: 318 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): state, district
dbl (1): gini
date (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
income <- read_csv("data/aspatial/income_district.csv")Rows: 318 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): state, district
dbl (2): income_mean, income_median
date (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
poverty <- read_csv("data/aspatial/poverty_district.csv")Rows: 318 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): state, district
dbl (2): poverty_absolute, poverty_relative
date (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Geospatial
st_read() of sf package will be used to import Malaysia shapefile into R environment in order to get the polygons representing the borders of the districts of Malaysia.
msia = st_read(dsn = "data/geospatial", layer = "mys_admbnda_adm2_unhcr_20210211")Reading layer `mys_admbnda_adm2_unhcr_20210211' from data source
`C:\ryanpxp\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 144 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 99.64072 ymin: 0.855001 xmax: 119.2697 ymax: 7.380556
Geodetic CRS: WGS 84
Data Wrangling
Before the UI prototyping can be done let’s see what type of data we are dealing with so that we can better plan for the UI components to be used.
Checking the CRS of the files:
st_crs(msia)Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
It seems to be using WGS 84, lets change that to EPSG:3168 instead.
msia <- msia %>%
st_transform(crs = 3168)st_crs(msia)Coordinate Reference System:
User input: EPSG:3168
wkt:
PROJCRS["Kertau (RSO) / RSO Malaya (m)",
BASEGEOGCRS["Kertau (RSO)",
DATUM["Kertau (RSO)",
ELLIPSOID["Everest 1830 (RSO 1969)",6377295.664,300.8017,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4751]],
CONVERSION["Rectified Skew Orthomorphic Malaya Grid (metre)",
METHOD["Hotine Oblique Mercator (variant A)",
ID["EPSG",9812]],
PARAMETER["Latitude of projection centre",4,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8811]],
PARAMETER["Longitude of projection centre",102.25,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8812]],
PARAMETER["Azimuth of initial line",323.0257905,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8813]],
PARAMETER["Angle from Rectified to Skew Grid",323.130102361111,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8814]],
PARAMETER["Scale factor on initial line",0.99984,
SCALEUNIT["unity",1],
ID["EPSG",8815]],
PARAMETER["False easting",804670.24,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["Malaysia - West Malaysia onshore."],
BBOX[1.21,99.59,6.72,104.6]],
ID["EPSG",3168]]
Quick glance at the geospatial data:
glimpse(msia)Rows: 144
Columns: 15
$ ADM2_EN <chr> "Batu Pahat", "Johor Bahru", "Kluang", "Kota Tinggi", "Kula…
$ ADM2_PCODE <chr> "MY0101", "MY0102", "MY0103", "MY0104", "MY0105", "MY0106",…
$ ADM2_REF <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM2ALT1EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM2ALT2EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM1_EN <chr> "Johor", "Johor", "Johor", "Johor", "Johor", "Johor", "Joho…
$ ADM1_PCODE <chr> "MY01", "MY01", "MY01", "MY01", "MY01", "MY01", "MY01", "MY…
$ ADM0_EN <chr> "Malaysia", "Malaysia", "Malaysia", "Malaysia", "Malaysia",…
$ ADM0_PCODE <chr> "MY", "MY", "MY", "MY", "MY", "MY", "MY", "MY", "MY", "MY",…
$ date <date> 2020-12-02, 2020-12-02, 2020-12-02, 2020-12-02, 2020-12-02…
$ validOn <date> 2021-02-11, 2021-02-11, 2021-02-11, 2021-02-11, 2021-02-11…
$ validTo <date> -001-11-30, -001-11-30, -001-11-30, -001-11-30, -001-11-30…
$ Shape_Leng <dbl> 1.8566472, 1.8288893, 2.3609919, 3.1527941, 1.0739527, 1.40…
$ Shape_Area <dbl> 0.16104119, 0.08016539, 0.23358469, 0.27830606, 0.06154171,…
$ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((556714.6 19..., MULTIPOLYGON (…
Quick glance at the crime data:
glimpse(crime)Rows: 19,152
Columns: 6
$ state <chr> "Malaysia", "Malaysia", "Malaysia", "Malaysia", "Malaysia", "…
$ district <chr> "All", "All", "All", "All", "All", "All", "All", "All", "All"…
$ category <chr> "assault", "assault", "assault", "assault", "assault", "assau…
$ type <chr> "all", "all", "all", "all", "all", "all", "all", "all", "caus…
$ date <date> 2016-01-01, 2017-01-01, 2018-01-01, 2019-01-01, 2020-01-01, …
$ crimes <dbl> 22327, 21366, 16902, 16489, 13279, 11495, 10348, 10453, 5531,…
Looking at the crime object we can see that “All” fields is included for both districts and type of crime. We want to filter these columns to make it easier to work with and join with geospatial data.
crime_filtered <- crime %>% filter(district != "All")Date conversion
Taking a look at the other aspatial data:
glimpse(income)Rows: 318
Columns: 5
$ state <chr> "Johor", "Johor", "Johor", "Johor", "Johor", "Johor", "J…
$ district <chr> "Batu Pahat", "Batu Pahat", "Johor Bahru", "Johor Bahru"…
$ date <date> 2019-01-01, 2022-01-01, 2019-01-01, 2022-01-01, 2019-01…
$ income_mean <dbl> 7392, 7419, 9315, 9869, 5953, 6461, 6982, 7529, 8602, 91…
$ income_median <dbl> 6504, 6347, 7342, 8232, 4933, 5204, 5475, 6227, 7536, 74…
glimpse(inequality)Rows: 318
Columns: 4
$ state <chr> "Johor", "Johor", "Johor", "Johor", "Johor", "Johor", "Johor"…
$ district <chr> "Batu Pahat", "Batu Pahat", "Johor Bahru", "Johor Bahru", "Kl…
$ date <date> 2019-01-01, 2022-01-01, 2019-01-01, 2022-01-01, 2019-01-01, …
$ gini <dbl> 0.295, 0.338, 0.388, 0.359, 0.333, 0.354, 0.361, 0.343, 0.324…
glimpse(poverty)Rows: 318
Columns: 5
$ state <chr> "Johor", "Johor", "Johor", "Johor", "Johor", "Johor",…
$ district <chr> "Batu Pahat", "Batu Pahat", "Johor Bahru", "Johor Bah…
$ date <date> 2019-01-01, 2022-01-01, 2019-01-01, 2022-01-01, 2019…
$ poverty_absolute <dbl> 2.9, 5.1, 3.3, 3.7, 5.0, 7.2, 6.0, 5.0, 3.2, 0.4, 12.…
$ poverty_relative <dbl> 9.0, 19.4, 12.8, 10.4, 24.9, 27.4, 20.8, 17.0, 10.1, …
Looking at the aspatial data, we can see that the date is just the first day of each year and that the month and day does not matter. We shall just convert the column to just the year.
Since the data for poverty, income, and income inequality only has data for 2019 and 2022, we keep only 2019 and 2022 crime data to match the other data.
crime_filtered <- crime_filtered %>%
mutate(year = year(date))%>%
select(-date) %>%
filter(year %in% c(2019, 2022)) income <- income %>%
mutate(year = year(date))%>%
select(-date) poverty <- poverty %>%
mutate(year = year(date))%>%
select(-date) inequality <- inequality %>%
mutate(year = year(date))%>%
select(-date) Check for empty rows
na <- crime_filtered %>%
summarise(na_district = sum(is.na(district)),
na_category = sum(is.na(category)),
na_type = sum(is.na(type)),
na_date = sum(is.na(date)),
na_crimes = sum(is.na(crimes))
)Warning: There was 1 warning in `summarise()`.
ℹ In argument: `na_date = sum(is.na(date))`.
Caused by warning in `is.na()`:
! is.na() applied to non-(list or vector) of type 'closure'
print(na)# A tibble: 1 × 5
na_district na_category na_type na_date na_crimes
<int> <int> <int> <int> <int>
1 0 0 0 0 0
na <- inequality %>%
summarise(na_district = sum(is.na(district)),
na_date = sum(is.na(date)),
na_gini = sum(is.na(gini))
)Warning: There was 1 warning in `summarise()`.
ℹ In argument: `na_date = sum(is.na(date))`.
Caused by warning in `is.na()`:
! is.na() applied to non-(list or vector) of type 'closure'
print(na)# A tibble: 1 × 3
na_district na_date na_gini
<int> <int> <int>
1 0 0 0
na <- poverty %>%
summarise(na_district = sum(is.na(district)),
na_date = sum(is.na(date)),
na_poverty_absolute = sum(is.na(poverty_absolute)),
na_poverty_relative = sum(is.na(poverty_relative)),
)Warning: There was 1 warning in `summarise()`.
ℹ In argument: `na_date = sum(is.na(date))`.
Caused by warning in `is.na()`:
! is.na() applied to non-(list or vector) of type 'closure'
print(na)# A tibble: 1 × 4
na_district na_date na_poverty_absolute na_poverty_relative
<int> <int> <int> <int>
1 0 0 0 0
na <- income %>%
summarise(na_district = sum(is.na(district)),
na_date = sum(is.na(date)),
na_poverty_income_mean = sum(is.na(income_mean)),
na_poverty_income_median = sum(is.na(income_median)),
)Warning: There was 1 warning in `summarise()`.
ℹ In argument: `na_date = sum(is.na(date))`.
Caused by warning in `is.na()`:
! is.na() applied to non-(list or vector) of type 'closure'
print(na)# A tibble: 1 × 4
na_district na_date na_poverty_income_mean na_poverty_income_median
<int> <int> <int> <int>
1 0 0 0 0
Mismatched district names
District names in crime_filtered that are not in district names of msia.
setdiff(unique(crime_filtered$district), unique(msia$ADM2_EN)) [1] "Iskandar Puteri" "Johor Bahru Selatan" "Johor Bahru Utara"
[4] "Nusajaya" "Seri Alam" "Bandar Bharu"
[7] "Nilai" "Cameron Highland" "Kuala Lipis"
[10] "Batu Gajah" "Gerik" "Ipoh"
[13] "Manjung" "Pengkalan Hulu" "Selama"
[16] "Sungai Siput" "Taiping" "Tanjong Malim"
[19] "Tapah" "Arau" "Kangar"
[22] "Padang Besar" "Seberang Perai Selatan" "Seberang Perai Tengah"
[25] "Seberang Perai Utara" "Kota Kinabatangan" "Kota Samarahan"
[28] "Matu Daro" "Padawan" "Ampang Jaya"
[31] "Hulu Selangor" "Kajang" "Klang Selatan"
[34] "Klang Utara" "Petaling Jaya" "Serdang"
[37] "Sg. Buloh" "Shah Alam" "Subang Jaya"
[40] "Sungai Buloh" "Brickfields" "Cheras"
[43] "Dang Wangi" "Sentul" "Wangsa Maju"
District names in income that are not in district names of msia.
setdiff(unique(income$district), unique(msia$ADM2_EN)) [1] "Kulai" "Tangkak" "Kecil Lojing"
[4] "Bagan Datuk" "Hulu Perak" "Larut dan Matang"
[7] "Manjung" "Muallim" "Selama"
[10] "Seberang Perai Selatan" "Seberang Perai Tengah" "Seberang Perai Utara"
[13] "Kalabakan" "Telupid" "Beluru"
[16] "Bukit Mabong" "Kabong" "Maradong"
[19] "Pusa" "Sebauh" "Subis"
[22] "Tanjung Manis" "Tebedu" "Telang Usan"
[25] "Kuala Nerus" "W.P. Kuala Lumpur"
District names in inequality that are not in district names of msia.
setdiff(unique(inequality$district), unique(msia$ADM2_EN)) [1] "Kulai" "Tangkak" "Kecil Lojing"
[4] "Bagan Datuk" "Hulu Perak" "Larut dan Matang"
[7] "Manjung" "Muallim" "Selama"
[10] "Seberang Perai Selatan" "Seberang Perai Tengah" "Seberang Perai Utara"
[13] "Kalabakan" "Telupid" "Beluru"
[16] "Bukit Mabong" "Kabong" "Maradong"
[19] "Pusa" "Sebauh" "Subis"
[22] "Tanjung Manis" "Tebedu" "Telang Usan"
[25] "Kuala Nerus" "W.P. Kuala Lumpur"
District names in poverty that are not in district names of msia.
setdiff(unique(poverty$district), unique(msia$ADM2_EN)) [1] "Kulai" "Tangkak" "Kecil Lojing"
[4] "Bagan Datuk" "Hulu Perak" "Larut dan Matang"
[7] "Manjung" "Muallim" "Selama"
[10] "Seberang Perai Selatan" "Seberang Perai Tengah" "Seberang Perai Utara"
[13] "Kalabakan" "Telupid" "Beluru"
[16] "Bukit Mabong" "Kabong" "Maradong"
[19] "Pusa" "Sebauh" "Subis"
[22] "Tanjung Manis" "Tebedu" "Telang Usan"
[25] "Kuala Nerus" "W.P. Kuala Lumpur"
There is no easy way to fix this but to google the districts mentioned in crime and try to map it as close as close to the district in the sf file.
We shall create a function for renaming the districts in the aspatial data to match that of the geospatial data:
rename_districts <- function(data) {
data <- data %>%
mutate(district = case_when(
district %in% c("Iskandar Puteri", "Nusajaya", "Johor Bahru Selatan", "Johor Bahru Utara", "Seri Alam") ~ "Johor Bahru",
district == "Bandar Bharu" ~ "Bandar Baharu",
district %in% c("Brickfields", "Cheras", "Dang Wangi", "Sentul", "Wangsa Maju","W.P. Kuala Lumpur") ~ "WP. Kuala Lumpur",
district == "Nilai" ~ "Seremban",
district == "Cameron Highland" ~ "Cameron Highlands",
district == "Kuala Lipis" ~ "Lipis",
district %in% c("Batu Gajah", "Ipoh") ~ "Kinta",
district == "Gerik" ~ "Ulu Perak",
district == "Manjung" ~ "Manjung (Dinding)",
district == "Pangkalan Hulu" ~ "Ulu Perak",
district %in% c("Selama", "Taiping", "Larut dan Matang") ~ "Larut Dan Matang",
district == "Sungai Siput" ~ "Kuala Kangsar",
district %in% c("Tanjong Malim", "Tapah", "Bagan Datuk", "Muallim") ~ "Batang Padang",
district %in% c("Arau", "Kangar", "Padang Besar") ~ "Perlis",
state == "Pulau Pinang" & district == "Seberang Perai Selatan" ~ "S.P.Selatan",
district == "Seberang Perai Tengah" ~ "S.P. Tengah",
district == "Seberang Perai Utara" ~ "S.P. Utara",
district == "Ampang Jaya" ~ "Gombak",
district == "Kajang" ~ "Ulu Langat",
district %in% c("Pengkalan Hulu","Hulu Perak") ~ "Ulu Perak",
district == "Hulu Selangor" ~ "Ulu Selangor",
district %in% c("Klang Selatan", "Klang Utara") ~ "Klang",
district %in% c("Petaling Jaya", "Serdang", "Sg. Buloh", "Shah Alam", "Subang Jaya", "Sungai Buloh") ~ "Petaling",
district == "Kota Kinabatangan" ~ "Kinabatangan",
district == "Kota Samarahan" ~ "Samarahan",
district %in% c("Matu Daro", "Tanjung Manis") ~ "Mukah",
district == "Padawan" ~ "Kuching",
district == "Kulai" ~ "Kulaijaya",
district == "Tangkak" ~ "Ledang",
district == "Kecil Lojing" ~ "Gua Musang",
district == "Kalabakan" ~ "Tawau",
district == "Telupid" ~ "Beluran",
district == "Beluru" ~ "Miri",
district == "Bukit Mabong" ~ "Kapit",
district == "Kabong" ~ "Saratok",
district == "Maradong" ~ "Meradong",
district == "Pusa" ~ "Betong",
district == "Sebauh" ~ "Bintulu",
district == "Subis" ~ "Miri",
district == "Tebedu" ~ "Serian",
district == "Telang Usan" ~ "Marudi",
district == "Kuala Nerus" ~ "Kuala Terengganu",
TRUE ~ district
))
return(data)
}Apply the renaming function to crime data:
crime_filtered <- rename_districts(crime_filtered)Remove and combine duplicates data that are considered to be in the same district after renaming:
crime_filtered <- crime_filtered %>%
group_by(state, district, category, type, year) %>%
summarize(crimes = sum(crimes), .groups = 'drop')Do the same for the other aspatial data:
income <- rename_districts(income)income <- income %>%
group_by(state, district, year) %>%
summarize(
income_mean = sum(income_mean),
income_median = sum(income_median),
.groups = 'drop'
)inequality <- rename_districts(inequality)inequality <- inequality %>%
group_by(state, district, year) %>%
summarize(
inequality = sum(gini),
.groups = 'drop'
)poverty <- rename_districts(poverty)poverty <- poverty %>%
group_by(state, district, year) %>%
summarize(
poverty_relative = sum(poverty_relative),
poverty_absolute = sum(poverty_absolute),
.groups = 'drop'
)Check if there is still any mismatched district names:
setdiff(unique(crime_filtered$district), unique(msia$ADM2_EN))character(0)
setdiff(unique(income$district), unique(msia$ADM2_EN))character(0)
setdiff(unique(inequality$district), unique(msia$ADM2_EN))character(0)
setdiff(unique(poverty$district), unique(msia$ADM2_EN))character(0)
Joining of data
Joining the aspatial data together
combined_data <- crime_filtered %>%
left_join(poverty, by = c("state", "district", "year")) %>%
left_join(inequality, by = c("state", "district", "year")) %>%
left_join(income, by = c("state", "district", "year"))Filtering out redundant data and keep only relevant data
msia_geometry <- msia %>%
select(1, 13:15)Joining with geospatial data
combined_data <- combined_data %>%
left_join(msia_geometry, by = c("district" = "ADM2_EN"))GWR
Type of crime selection
For our Shiny app, we would like user to be able to freely choose the type of crime for their own viewing. This could be the individual types or all of the types.
To get types of crime to only be murder:
combined_data_murder <- combined_data %>%
filter(type == "murder")To get types of crime to only be causing_injury:
combined_data_injury <- combined_data %>%
filter(type == "causing_injury")To get types of crime to be all:
combined_data_all <- combined_data %>%
filter(type == "all")To get types of crime to be causing_injury and murderin 2022 only:
combined_data_filtered <- combined_data %>%
filter(type %in% c("causing_injury", "murder") & year %in% c(2022))While using filter(type == "causing_injury") is concise but filter(type %in% c("causing_injury")) allows the flexibility for one or more independent variables. This is beneficial for us when we want to allow users to have the freedom to select their own variable(s).
Correlation matrix
We would also like for the user to be able to view the correlation matrix so that they can freely choose which independent variables to be used on GWR conducted in our application.
For murder cases:
ggcorrmat(combined_data_murder[, 6:11])
For causing injury cases:
ggcorrmat(combined_data_injury[, 6:11])
An interesting parameter that can be included in ggcorrmat() is that we can choose the significance level of the matrix. This allows user to have more control over viewing the correlation.
ggcorrmat(combined_data_all[, 6:11], sig.level = 0.05)
Note: Income median and mean are highly correlated since they are similar and came from the same data set. Either of the variable might be removed for GWR in the Shiny App.

The correlation analysis will be under a tabset where user can toggle from the main GWR. On the side panel, selectInput() will be used to select independent variables that the user wants to view and the type of crimes, another checkboxGroupInput() to select the year(s) and sliderInput() will be used for them to select the significance level of the correlation matrix.,
Linear regression model
To allow user is able to freely select which independent variables and type of crimes we need to create a function that dynamically accepts inputs.
run_regression <- function(data, response, predictors) {
# Create formula from response and predictors
formula <- as.formula(
paste(response, "~", paste(predictors, collapse = " + "))
)
# Run the linear model
model <- lm(formula = formula, data = data)
return(model)
}An example of using this function with all the variables as the independent variables and only using murder as the type of crime:
# Define predictors as a vector of variable names
predictors <- c("poverty_relative", "poverty_absolute", "inequality", "income_mean", "income_median")
# Run the function with the specified data and predictors
murder_model <- run_regression(
data = combined_data_murder,
response = "crimes",
predictors = predictors
)
ols_regress(murder_model) Model Summary
----------------------------------------------------------------
R 0.695 RMSE 2.760
R-Squared 0.483 MSE 7.797
Adj. R-Squared 0.472 Coef. Var 132.240
Pred R-Squared 0.444 AIC 1279.749
MAE 1.805 SBC 1304.674
----------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------
Regression 1847.330 5 369.466 47.386 0.0000
Residual 1980.435 254 7.797
Total 3827.765 259
---------------------------------------------------------------------
Parameter Estimates
------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
------------------------------------------------------------------------------------------------
(Intercept) 0.165 0.582 0.283 0.777 -0.981 1.310
poverty_relative 0.029 0.020 0.104 1.509 0.133 -0.009 0.068
poverty_absolute 0.018 0.024 0.057 0.757 0.450 -0.029 0.065
inequality -22.041 3.030 -0.742 -7.274 0.000 -28.008 -16.073
income_mean 0.005 0.000 3.634 10.779 0.000 0.004 0.006
income_median -0.005 0.001 -2.757 -8.686 0.000 -0.006 -0.004
------------------------------------------------------------------------------------------------
An example of using this function with poverty absolute, inequality and median income as the independent variables and all type of crime:
# Define predictors as a vector of variable names
predictors <- c("poverty_absolute", "inequality", "income_median")
# Run the function with the specified data and predictors
all_model <- run_regression(
data = combined_data_all,
response = "crimes",
predictors = predictors
)
ols_regress(all_model) Model Summary
--------------------------------------------------------------------
R 0.408 RMSE 595.003
R-Squared 0.167 MSE 356772.709
Adj. R-Squared 0.162 Coef. Var 232.526
Pred R-Squared 0.142 AIC 8129.805
MAE 271.215 SBC 8151.074
--------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------------
Regression 36800562.495 3 12266854.165 34.383 0.0000
Residual 184094717.629 516 356772.709
Total 220895280.123 519
---------------------------------------------------------------------------
Parameter Estimates
-----------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
-----------------------------------------------------------------------------------------------------
(Intercept) 31.321 87.811 0.357 0.721 -141.190 203.832
poverty_absolute -0.599 3.467 -0.011 -0.173 0.863 -7.410 6.211
inequality -1289.314 375.395 -0.256 -3.435 0.001 -2026.804 -551.823
income_median 0.141 0.018 0.473 7.924 0.000 0.106 0.176
-----------------------------------------------------------------------------------------------------
Stepwise regression model selection
While allowing user the freedom of controlling the model used, we should provide information for them to choose model that suit their needs.
Function for selecting the type of stepwise regression:
run_stepwise_selection <- function(model, direction = "forward", p_val = 0.05, details = FALSE) {
if (!direction %in% c("forward", "backward", "both")) {
stop("Invalid direction. Choose from 'forward', 'backward', or 'both'.")
}
stepwise_model <- switch(
direction,
"forward" = ols_step_forward_p(model, p_val = p_val, details = details),
"backward" = ols_step_backward_p(model, p_val = p_val, details = details),
"both" = ols_step_both_p(model, p_val = p_val, details = details)
)
return(stepwise_model)
}Forward selection
murder_fw_mlr <- run_stepwise_selection(
model = murder_model,
direction = "forward",
p_val = 0.05,
details = FALSE
)
print(murder_fw_mlr)
Stepwise Summary
------------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
------------------------------------------------------------------------------
0 Base Model 1441.080 1448.202 701.849 0.00000 0.00000
1 income_mean 1374.046 1384.728 634.769 0.23319 0.23022
2 inequality 1343.382 1357.625 604.139 0.32372 0.31846
3 income_median 1279.575 1297.379 541.796 0.47494 0.46879
------------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
----------------------------------------------------------------
R 0.689 RMSE 2.780
R-Squared 0.475 MSE 7.851
Adj. R-Squared 0.469 Coef. Var 132.696
Pred R-Squared 0.445 AIC 1279.575
MAE 1.853 SBC 1297.379
----------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------
Regression 1817.974 3 605.991 77.189 0.0000
Residual 2009.792 256 7.851
Total 3827.765 259
---------------------------------------------------------------------
Parameter Estimates
---------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
---------------------------------------------------------------------------------------------
(Intercept) 0.004 0.575 0.007 0.995 -1.129 1.137
income_mean 0.005 0.000 3.505 10.577 0.000 0.004 0.006
inequality -17.547 1.825 -0.591 -9.616 0.000 -21.141 -13.954
income_median -0.005 0.001 -2.710 -8.587 0.000 -0.006 -0.004
---------------------------------------------------------------------------------------------
Backward Elimination
murder_bw_mlr <- run_stepwise_selection(
model = murder_model,
direction = "backward",
p_val = 0.05,
details = FALSE
)
print(murder_bw_mlr)
Stepwise Summary
---------------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
---------------------------------------------------------------------------------
0 Full Model 1279.749 1304.674 542.184 0.48261 0.47243
1 poverty_absolute 1278.335 1299.699 540.699 0.48145 0.47331
2 poverty_relative 1279.575 1297.379 541.796 0.47494 0.46879
---------------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
----------------------------------------------------------------
R 0.689 RMSE 2.780
R-Squared 0.475 MSE 7.851
Adj. R-Squared 0.469 Coef. Var 132.696
Pred R-Squared 0.445 AIC 1279.575
MAE 1.853 SBC 1297.379
----------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------
Regression 1817.974 3 605.991 77.189 0.0000
Residual 2009.792 256 7.851
Total 3827.765 259
---------------------------------------------------------------------
Parameter Estimates
---------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
---------------------------------------------------------------------------------------------
(Intercept) 0.004 0.575 0.007 0.995 -1.129 1.137
inequality -17.547 1.825 -0.591 -9.616 0.000 -21.141 -13.954
income_mean 0.005 0.000 3.505 10.577 0.000 0.004 0.006
income_median -0.005 0.001 -2.710 -8.587 0.000 -0.006 -0.004
---------------------------------------------------------------------------------------------
Bidirectional Elimination
murder_sb_mlr <- run_stepwise_selection(
model = murder_model,
direction = "both",
p_val = 0.05,
details = FALSE
)
print(murder_sb_mlr)
Stepwise Summary
-------------------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
-------------------------------------------------------------------------------------
0 Base Model 1441.080 1448.202 701.849 0.00000 0.00000
1 income_mean (+) 1374.046 1384.728 634.769 0.23319 0.23022
2 inequality (+) 1343.382 1357.625 604.139 0.32372 0.31846
3 income_median (+) 1279.575 1297.379 541.796 0.47494 0.46879
4 poverty_relative (+) 1278.335 1299.699 540.699 0.48145 0.47331
-------------------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
----------------------------------------------------------------
R 0.694 RMSE 2.763
R-Squared 0.481 MSE 7.784
Adj. R-Squared 0.473 Coef. Var 132.130
Pred R-Squared 0.448 AIC 1278.335
MAE 1.813 SBC 1299.699
----------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------
Regression 1842.865 4 460.716 59.188 0.0000
Residual 1984.900 255 7.784
Total 3827.765 259
---------------------------------------------------------------------
Parameter Estimates
------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
------------------------------------------------------------------------------------------------
(Intercept) 0.102 0.575 0.178 0.859 -1.031 1.236
income_mean 0.005 0.000 3.623 10.767 0.000 0.004 0.006
inequality -20.859 2.594 -0.702 -8.040 0.000 -25.968 -15.750
income_median -0.005 0.001 -2.774 -8.770 0.000 -0.006 -0.004
poverty_relative 0.033 0.019 0.119 1.788 0.075 -0.003 0.070
------------------------------------------------------------------------------------------------
Visualising model metrics
metric <- compare_performance(murder_model,
murder_fw_mlr$model,
murder_bw_mlr$model,
murder_sb_mlr$model)metric$Name <- gsub(".*\\\\([a-zA-Z0-9_]+)\\\\, \\\\model\\\\.*", "\\1", metric$Name)plot(metric)
While the plot looks fine, the naming of the models are confusing and special care should be taken to ensure that they are indicative to the reader what model this is.

The correlation analysis will be under a tabset where user can toggle from the main GWR. On the side panel, selectInput() will be used to select the type of crimes and the independent variables that the user wants to view, another checkboxGroupInput() to select the year(s) and sliderInput() will be used for them to select the p value of the regression models.
Checks
Linearity test
out <- plot(check_model(murder_sb_mlr$model,
panel = FALSE))For confidence bands, please install `qqplotr`.
out[[2]]
Normality test
plot(check_normality(murder_sb_mlr$model))For confidence bands, please install `qqplotr`.

A histogram might be a better approach for user to tell the distribution at a glance instead.
ols_plot_resid_hist(murder_sb_mlr$model)
Outliers
plot(check_outliers(murder_sb_mlr$model,
method = "cook"))
Multicollinearity
plot(check_collinearity(murder_sb_mlr$model)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))Variable `Component` is not in your data frame :/


The checks will be made up of the multicollinearity, normality, outliers, and collinearity graphs. It will be under a tabset where user can toggle from the main GWR. On the side panel, selectInput() will be used to select independent variables that the user wants to view and the type of crimes, another checkboxGroupInput() to select the year(s).
Testing for Spatial Autocorrelation
mlr_output <- as.data.frame(murder_sb_mlr$model$residuals) %>%
rename(`SB_MLR_RES` = `murder_sb_mlr$model$residuals`)Due to empty rows we need to pad “NA”
murder_residual <- data.frame(MLR_RES = rep(NA, nrow(combined_data_murder)))
murder_residual[rownames(mlr_output), "MLR_RES"] <- mlr_output$SB_MLR_REScombined_data_murder <- cbind(combined_data_murder,
murder_residual)combined_data_murder_st <- st_as_sf(combined_data_murder)Visualising:
tmap_mode("view")tmap mode set to interactive viewing
tm_shape(combined_data_murder_st) +
tm_polygons(col = "MLR_RES", alpha = 0.6)+
tm_view(set.zoom.limits = c(5, 9))Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.